Skip to content

Conversation

@KseniyaTikhomirova
Copy link
Collaborator

@KseniyaTikhomirova KseniyaTikhomirova commented Jun 5, 2025

This is part of the SYCL support upstreaming effort. The relevant RFCs can
be found here:

https://discourse.llvm.org/t/rfc-add-full-support-for-the-sycl-programming-model/74080
https://discourse.llvm.org/t/rfc-sycl-runtime-upstreaming/74479

The SYCL runtime is device-agnostic and uses Unified Runtime (GitHub -
oneapi-src/unified-runtime) as an external dependency. This Unified Runtime
serves as an interface layer between the SYCL runtime and device-specific
backends. Unified Runtime has several adapters that bind to various backends.

NOTE: UR is considered as temporal solution until llvm-project/offload is
fully functional and is able to replace UR.

This commit adds:

fetching UR, UR build as dependency, document with a short overview of UR
with links to repos and documentation.

@KseniyaTikhomirova
Copy link
Collaborator Author

For reviewers
Example of install structure for L0 adapter:

.../ssfork_llvm$ ninja -C $build_llvm install | grep "level_zero"
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/./include/level_zero/ze_api.h
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/./include/level_zero/ze_ddi.h
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/./include/level_zero/ze_ddi_common.h
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/./include/level_zero/zes_api.h
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/./include/level_zero/zes_ddi.h
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/./include/level_zero/zet_api.h
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/./include/level_zero/zet_ddi.h
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/./include/level_zero/layers/zel_tracing_api.h
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/./include/level_zero/layers/zel_tracing_ddi.h
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/./include/level_zero/layers/zel_tracing_register_cb.h
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/./include/level_zero/loader/ze_loader.h
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/share/doc/Runtimes/examples/common/examples_level_zero_helpers.h
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/share/doc/Runtimes/examples/common/examples_level_zero_helpers.c
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/share/doc/Runtimes/examples/level_zero_shared_memory
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/share/doc/Runtimes/examples/level_zero_shared_memory/level_zero_shared_memory.c
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/share/doc/Runtimes/examples/level_zero_shared_memory/CMakeLists.txt
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/share/doc/Runtimes/examples/ipc_level_zero
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/share/doc/Runtimes/examples/ipc_level_zero/ipc_level_zero.c
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/share/doc/Runtimes/examples/ipc_level_zero/CMakeLists.txt
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/include/umf/providers/provider_level_zero.h
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/lib/libur_adapter_level_zero.so.0.12.0
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/lib/libur_adapter_level_zero.so.0
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/lib/libur_adapter_level_zero.so
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/lib/libur_adapter_level_zero.so.0.12.0
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/lib/libur_adapter_level_zero.so.0
-- Up-to-date: /localdisk2/ktikhomi/repo/ssfork_llvm/install/release/lib/libur_adapter_level_zero.so

@KseniyaTikhomirova
Copy link
Collaborator Author

KseniyaTikhomirova commented Jun 5, 2025

@tahonermann, @dvrogozh, @asudarsa, @aelovikov-intel
I consider this PR as the next step after the very first PR with libsycl project structure: #1. That PR has not been published to upstream yet but we did agree on the content.

Would be nice to start reviewing this step earlier.

@KseniyaTikhomirova
Copy link
Collaborator Author

kindly ping: @tahonermann, @dvrogozh, @asudarsa, @aelovikov-intel
For some reason I can't add you to this PR as reviewers

set(UMF_LINK_HWLOC_STATICALLY ON CACHE INTERNAL "static HWLOC")
endif()

fetch_adapter_source(level_zero

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You note in the PR description that "UR is considered as temporal solution until llvm-project/offload is
fully functional and is able to replace UR". I afraid that if it will be merged it will very well become an actual solution which will be quite hard to remove. For example, existing UR depends on 4 adapters - are you sure that code for all adapters will be easily/at all accepted into llvm-project/offload project? I do not believe that such a temporary solution is the right approach. Instead, it's better to focus on llvm-project/offload directly, limit the scope for initial support (Intel GPUs) and go from that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sergey-semenov could you please help to answer this since this approach had been discussed before I joined upstreaming activity.

AFAIK UR presence in upstream was discussed and not really greeted in community. Although folks made an agreement to start with UR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @dvrogozh
I see the email discussions and RFC discussions about this issue. But I am not able to find any communication on what we agreed on.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we've had any pushback on UR as a short-term dependency to unblock rt upstreaming in the RFCs (beyond being asked to run this by the LLVM board, which we have). I believe the current plan is to bring liboffload to functional parity with UR this year, which is when we're going to switch to it in both intel/llvm and upstream. @RaviNarayanaswamy @alycm please correct me if I'm wrong on any of this.

Copy link

@RaviNarayanaswamy RaviNarayanaswamy Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sergey-semenov is correct. liboffload is being worked on, currently most of the contribution is done by CodePlay
. For the short term there was no objection from the community to use UR for offloading.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codeplay, unless CodeSourcery are helping too! :-)

But yes, we're working on liboffload. Using liboffload is the long-term goal, but it is not yet mature enough to fully support SYCL-RT.

There is a liboffload adapter in Unified Runtime, so you can run SYCL-RT --> Unified Runtime --> liboffload. We're using this to drive development and for testing. But most SYCL features don't work yet.

@asudarsa
Copy link

kindly ping: @tahonermann, @dvrogozh, @asudarsa, @aelovikov-intel For some reason I can't add you to this PR as reviewers

Hi @KseniyaTikhomirova Thanks for ping. I will look at this today.

@@ -0,0 +1,26 @@
=====================

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Any reason why this is this not directly under docs?

Thanks

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UR is implementation details (design) of SYCL RT. I believe that under docs we should keep user visible things like guides, FAQ, release notes and other.
libcxx also splits documents in this way https://github.com/llvm/llvm-project/tree/main/libcxx/docs
intel/llvm splitting is also very similar https://github.com/intel/llvm/blob/sycl/sycl/doc/design/UnifiedRuntime.md

@KseniyaTikhomirova
Copy link
Collaborator Author

@tahonermann, @dvrogozh, @asudarsa, @aelovikov-intel, @sergey-semenov
I believe the question about UR presence is answered. Kindly ping to review & approve if you have no objections.


.. _unified runtime:

Overview

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid a sub-section (Overview) here. We can add this if we add more details to this document.

Thanks

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in 7fb24ad

Overview
========

The Unified Runtime project serves as an interface layer between the SYCL

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Unified Runtime project serves as an interface layer between the SYCL
The Unified Runtime (UR) project serves as an interface layer between the SYCL

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated in 7fb24ad

Copy link

@asudarsa asudarsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document changes look good. Couple of nits.

Thanks

bader pushed a commit that referenced this pull request Jun 27, 2025
LLVM prevents the sm_32_intrinsics.hpp header from being included with a
#define SM_32_INTRINSICS_HPP. It also provides drop-in replacements of
the functions defined in the CUDA header.

One issue is that some intrinsics were added after the replacement was
written, and thus have no replacement, breaking code that calls them
(Raft is one example).

This commit backport the code from sm_32_intrinsics.hpp for the missing
intrinsics.

This is the second try after PR llvm#143664 broke tests.
bader pushed a commit that referenced this pull request Jun 27, 2025
The function already exposes a work list to avoid deep recursion, this
commit starts utilizing it in a helper that could also lead to a deep
recursion.

We have observed this crash on `clang/test/C/C99/n590.c` with our
internal builds that enable aggressive optimizations and hit the limit
earlier than default release builds of Clang.

See the added test for an example with a deeper recursion that used to
crash in upstream Clang before this change with the following stack
trace:

```
  #0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /usr/local/google/home/ibiryukov/code/llvm-project/llvm/lib/Support/Unix/Signals.inc:804:13
  #1 llvm::sys::RunSignalHandlers() /usr/local/google/home/ibiryukov/code/llvm-project/llvm/lib/Support/Signals.cpp:106:18
  #2 SignalHandler(int, siginfo_t*, void*) /usr/local/google/home/ibiryukov/code/llvm-project/llvm/lib/Support/Unix/Signals.inc:0:3
  #3 (/lib/x86_64-linux-gnu/libc.so.6+0x3fdf0)
  #4 AnalyzeImplicitConversions(clang::Sema&, clang::Expr*, clang::SourceLocation, bool) /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12772:0
  llvm#5 CheckCommaOperand /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:0:3
  llvm#6 AnalyzeImplicitConversions /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12644:7
  llvm#7 AnalyzeImplicitConversions(clang::Sema&, clang::Expr*, clang::SourceLocation, bool) /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12776:5
  llvm#8 CheckCommaOperand /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:0:3
  llvm#9 AnalyzeImplicitConversions /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12644:7
 llvm#10 AnalyzeImplicitConversions(clang::Sema&, clang::Expr*, clang::SourceLocation, bool) /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12776:5
 llvm#11 CheckCommaOperand /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:0:3
 llvm#12 AnalyzeImplicitConversions /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12644:7
 llvm#13 AnalyzeImplicitConversions(clang::Sema&, clang::Expr*, clang::SourceLocation, bool) /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12776:5
 llvm#14 CheckCommaOperand /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:0:3
 llvm#15 AnalyzeImplicitConversions /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12644:7
 llvm#16 AnalyzeImplicitConversions(clang::Sema&, clang::Expr*, clang::SourceLocation, bool) /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12776:5
 llvm#17 CheckCommaOperand /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:0:3
 llvm#18 AnalyzeImplicitConversions /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12644:7
 llvm#19 AnalyzeImplicitConversions(clang::Sema&, clang::Expr*, clang::SourceLocation, bool) /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12776:5
... 700+ more stack frames.
```
AmrDeveloper and others added 15 commits July 2, 2025 19:25
This change adds support for the not equal operation for ComplexType

llvm#141365
Summary:
The allocator interface is supposed to have 16 byte alignment (to keep
it consistent with the CPU allocator. We could probably drop this to 8
if desires.) But this was not enforced because the number of bytes used
for the bitfield sometimes resulted in alignment of 8 instead of 16.
Explicitly align the number of bytes to be a multiple of 16 even if
unused.
PR llvm#141106 changed the debuginfo metdata to allow dynamic bit offsets
and sizes. This caused a crash in lld when using LTO.

The problem is that lazyLoadOneMetadata assumes that the metadata in
question can be cast to MDNode; but in the typical case where the offset
is a constant, this is not true.

This patch changes this spot to allow non-MDNodes through.

The included test case comes from the report in llvm#141106.
…BB_ADDR_MAP_V0). (llvm#146186)

Version 2 was added more than two years ago
(llvm@6015a04).
So it should be safe to deprecate older versions.
This patch fixes:

  lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp:415:7:
  error: label at end of compound statement is a C++23 extension
  [-Werror,-Wc++23-extensions]

  lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp:536:7:
  error: label at end of compound statement is a C++23 extension
  [-Werror,-Wc++23-extensions]

  lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp:672:7:
  error: label at end of compound statement is a C++23 extension
  [-Werror,-Wc++23-extensions]
This patch introduces a new custom type `!spirv.arm.tensor<>` to the
MLIR SPIR-V dialect to represent
`OpTypeTensorARM` as defined in the `SPV_ARM_tensors` extension.

The type models a shaped tensor with element type and optional shape,
and implements the
`ShapedType` interface to enable reuse of MLIR's existing shape-aware
infrastructure.

The type supports serialization to and from SPIR-V binary as
`OpTypeTensorARM`, and emits the
required capability (`TensorsARM`) and extension (`SPV_ARM_tensors`)
declarations automatically.

This addition lays the foundation for supporting structured tensor
values natively in SPIR-V and
will enable future support for operations defined in the
`SPV_ARM_tensors` extension, such as
`OpTensorReadARM`, `OpTensorWriteARM`, and `OpTensorQuerySizeARM`.

Reference: KhronosGroup/SPIRV-Registry#342

---------

Signed-off-by: Davide Grohmann <[email protected]>
Signed-off-by: Mohammadreza Ameri Mahabadian <[email protected]>
…/isGuaranteedNotToBeUndefOrPoisonForTargetNode (llvm#146728)

None of these implicitly generate UNDEF/POISON
The only use of Receiver is to initialize RecExpr.  This patch renames
Receiver to RecExpr while removing the cast statement.
)

This patch fixes the following error:
```
llvm/lib/Support/TextEncoding.cpp:274:11: error: cannot initialize a variable of type 'char *' with an rvalue of type 'const char *'
  274 |     char *Input = InputLength ? const_cast<char *>(Source.data()) : "";
      |           ^       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
In DXC, there is an option to enable all KHR extension. I would like to
extend the existing `-spirv-ext` backend commandline option to have the
same capability. It is like the special case for `all` execept it only
adds the `SPV_KHR_*` extensions.

Part of llvm#137650.
This reverts commit 988876c.
Was intended to be a PR
Refactors new/delete interceptor macros per the discussion in llvm#145087.

Signed-off-by: Justin King <[email protected]>
DavidSpickett and others added 21 commits July 4, 2025 09:02
…on (llvm#138144)

Background:
https://discourse.llvm.org/t/rfc-explaining-release-package-types-and-purposes/85985

So that users can understand which they should use, particularly for
Windows. The original text about community builds is kept, after
explaining the main release package formats.

In addition, explain how to use gpg or gh to verify the packages.
…lvm#146909)

The only difference is that with libc++ the summary string contains the
derefernced pointer value. With libstdc++ we currently display the
pointer itself, which seems redundant. E.g.,
```
(std::unique_ptr<int>) iup = 0x55555556d2b0 {
  pointer = 0x000055555556d2b0
}
(std::unique_ptr<std::basic_string<char> >) sup = 0x55555556d2d0 {
  pointer = "foobar"
}
```

This patch moves the logic into a common helper that's shared between
the libc++ and libstdc++ formatters.

After this patch we can combine the libc++ and libstdc++ API tests (see
llvm#146740).
…ch64 macOS version

Currently failing on the arm64 macOS CI with:
```
06:59:37  Traceback (most recent call last):
06:59:37    File "/Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake-sanitized/llvm-project/lldb/test/API/commands/frame/var-dil/basics/GlobalVariableLookup/TestFrameVarDILGlobalVariableLookup.py", line 47, in test_frame_var
06:59:37      self.expect_var_path("ExtStruct::static_inline", value="16")
06:59:37    File "/Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake-sanitized/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2589, in expect_var_path
06:59:37      value_check.check_value(self, eval_result, str(eval_result))
06:59:37    File "/Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake-sanitized/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 301, in check_value
06:59:37      test_base.assertSuccess(val.GetError())
06:59:37    File "/Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake-sanitized/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2597, in assertSuccess
06:59:37      self.fail(self._formatMessage(msg, "'{}' is not success".format(error)))
06:59:37  AssertionError: '<user expression 0>:1:1: use of undeclared identifier 'ExtStruct::static_inline'
06:59:37     1 | ExtStruct::static_inline
06:59:37       | ^' is not success
06:59:37  Config=arm64-/Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake-sanitized/lldb-build/bin/clang
06:59:37  ----------------------------------------------------------------------
06:59:37  Ran 1 test in 2.322s
06:59:37
```

Can't repro this locally so skipping on older macOS versions that the CI
is running.
The inheritance hierarchy for `llvm::ms_demangle::Node`
([doxygen](https://llvm.org/doxygen/structllvm_1_1ms__demangle_1_1Node.html))
is a bit more involved. One thing that's missing without RTTI is the
ability to determine if a node is a symbol, identifier, or type (or one
would need to check for every kind).

This PR adds support for `dyn_cast`, `isa`, and friends to
`llvm::ms_demangle::Node`. As the type already has a `kind()`, this
mainly adds `classof` to the nodes as well as some start and end markers
in the `NodeKind` enum.
…lvm#141937)

RFC on discourse:
https://discourse.llvm.org/t/rfc-debug-info-for-coroutine-suspension-locations-take-2/86606

With this commit, we add `DILabel` debug infos to the resume points of a
coroutine. Those labels can be used by debugging scripts to figure out
the exact line and column at which a coroutine was suspended by looking
up current `__coro_index` value inside the coroutines frame, and then
searching for the corresponding label inside the coroutine's resume
function.

The DWARF information generated for such a label looks like:

```
0x00000f71:     DW_TAG_label
                  DW_AT_name    ("__coro_resume_1")
                  DW_AT_decl_file       ("generator-example.cpp")
                  DW_AT_decl_line       (5)
                  DW_AT_decl_column     (3)
                  DW_AT_artificial      (true)
                  DW_AT_LLVM_coro_suspend_idx   (0x01)
                  DW_AT_low_pc  (0x00000000000019be)
```

The labels can be mapped to their corresponding `__coro_idx` values
either via their naming convention `__coro_resume_<N>` or using the new
`DW_AT_LLVM_coro_suspend_idx` attribute. In gdb, those line numebrs can
be looked up using `info line -function my_coroutine -label
__coro_resume_1`. LLDB unfortunately does not understand DW_TAG_label
debug information, yet.

Given this is an artificial compiler-generated label, I did apply the
DW_AT_artificial tag to it. The DWARFv5 standard only allows that tag on
type and variable definitions, but this is a natural extension and was
also blessed in the RFC on discourse.

Also, this commit adds `DW_AT_decl_column` to labels, not only for
coroutines but also for normal C and C++ labels. While not strictly
necessary, I am doing so now because it would be harder to do so later
without breaking the binary LLVM-IR format

Drive-by fixes: While reading the existing test cases to understand how
to write my own test case, I did a couple of small typo fixes and
comment improvements
This patch is part of a series that adds origin-tracking to the debugify
source location coverage checks, allowing us to report symbolized stack
traces of the point where missing source locations appear.

This patch completes the feature, having debugify handle origin stack
traces by symbolizing them when an associated bug is found and printing
them into the JSON report file as part of the bug entry. This patch also
updates the script that parses the JSON report and creates a
human-readable HTML report, adding an "Origin" entry to the table that
contains an expandable textbox containing the symbolized stack trace.
The Buildkite CI was unintentionally disabled for a few weeks. This
patch fixes the
CI jobs now that is has been re-enabled.
The use-case for `__is_same_uncvref` seems rather dubious, since not a
single use-cases needed the `remove_cvref_t` to be applied to both of
the arguments. Removing the alias makes it clearer what actually
happens, since we're not using an internal name anymore and it's clear
what the `remove_cvref_t` should apply to.
These changes were split off from llvm#146503.

This commit makes the output directories of libclc artefacts explicit.
It creates a variable for the final output directory -
LIBCLC_OUTPUT_LIBRARY_DIR - which has not changed. This allows future
changes to alter the output directory more simply, such as by pointing
it to somewhere inside clang's resource directory.

This commit also changes the output directory of each target's
intermediate builtins.*.bc files. They are now placed into each
respective libclc target's object directory, rather than the top-level
libclc binary directory. This should help keep the binary directory a
bit tidier.
This extension extends the subgroup block read and write functions
defined by `cl_intel_subgroups` (and, when supported,
`cl_intel_subgroups_char`, `cl_intel_subgroups_short`, and
`cl_intel_subgroups_long`) to support reading from and writing to
pointers to the `__local` memory address space in addition to pointers
to the `__global` memory address space.

It is already supported by the Intel OpenCL compiler.

Co-authored-by: Victor Mustya <[email protected]>
The prepare target was depending on the output of a custom command, but
wasn't the full path to that file. This tripped up CMake if the file was
removed as it didn't know how to rebuild that file.
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
@KseniyaTikhomirova
Copy link
Collaborator Author

hi @tahonermann, @dvrogozh, @asudarsa, @aelovikov-intel, @sergey-semenov, I created another PR for these changes since initial (this) PR was created against old branch created for internal review and has too much differences with branch published to community.
This is new PR I'd like to ask you to review #4.

KseniyaTikhomirova pushed a commit that referenced this pull request Jul 29, 2025
Tracked at llvm#112294

This patch implements from [basic.link]p14 to [basic.link]p18 partially.

The explicitly missing parts are:
- Anything related to specializations.
- Decide if a pointer is associated with a TU-local value at compile
  time.
- [basic.link]p15.1.2 to decide if a type is TU-local.
- Diagnose if TU-local functions from other TU are collected to the
  overload set. See [basic.link]p19, the call to 'h(N::A{});' in
  translation unit #2

There should be other implicitly missing parts as the wording uses
"names" briefly several times. But to implement this precisely, we have
to visit the whole AST, including Decls, Expression and Types, which may
be harder to implement and be more time-consuming for compilation time.
So I choose to implement the common parts.

It won't be too bad to miss some cases since we DIDN'T do any such
checks in the past 3 years. Any new check is an improvement. Given
modules have been basically available since clang15 without such checks,
it will be user unfriendly if we give a hard error now. And there are
a lot of cases which violating the rule actually just fine. So I decide
to emit it as warnings instead of hard errors.
bader pushed a commit that referenced this pull request Jul 31, 2025
Extend support in LLDB for WebAssembly. This PR adds a new Process
plugin (ProcessWasm) that extends ProcessGDBRemote for WebAssembly
targets. It adds support for WebAssembly's memory model with separate
address spaces, and the ability to fetch the call stack from the
WebAssembly runtime.

I have tested this change with the WebAssembly Micro Runtime (WAMR,
https://github.com/bytecodealliance/wasm-micro-runtime) which implements
a GDB debug stub and supports the qWasmCallStack packet.

```
(lldb) process connect --plugin wasm connect://localhost:4567
Process 1 stopped
* thread #1, name = 'nobody', stop reason = trace
    frame #0: 0x40000000000001ad
wasm32_args.wasm`main:
->  0x40000000000001ad <+3>:  global.get 0
    0x40000000000001b3 <+9>:  i32.const 16
    0x40000000000001b5 <+11>: i32.sub
    0x40000000000001b6 <+12>: local.set 0
(lldb) b add
Breakpoint 1: where = wasm32_args.wasm`add + 28 at test.c:4:12, address = 0x400000000000019c
(lldb) c
Process 1 resuming
Process 1 stopped
* thread #1, name = 'nobody', stop reason = breakpoint 1.1
    frame #0: 0x400000000000019c wasm32_args.wasm`add(a=<unavailable>, b=<unavailable>) at test.c:4:12
   1    int
   2    add(int a, int b)
   3    {
-> 4        return a + b;
   5    }
   6
   7    int
(lldb) bt
* thread #1, name = 'nobody', stop reason = breakpoint 1.1
  * frame #0: 0x400000000000019c wasm32_args.wasm`add(a=<unavailable>, b=<unavailable>) at test.c:4:12
    frame #1: 0x40000000000001e5 wasm32_args.wasm`main at test.c:12:12
    frame #2: 0x40000000000001fe wasm32_args.wasm
```

This PR is based on an unmerged patch from Paolo Severini:
https://reviews.llvm.org/D78801. I intentionally stuck to the
foundations to keep this PR small. I have more PRs in the pipeline to
support the other features/packets.

My motivation for supporting Wasm is to support debugging Swift compiled
to WebAssembly:
https://www.swift.org/documentation/articles/wasm-getting-started.html
KseniyaTikhomirova pushed a commit that referenced this pull request Aug 1, 2025
Pointers and GEP are untyped. SPIR-V required structured OpAccessChain.
This means the backend will have to determine a good way to retrieve the
structured access from an untyped GEP. This is not a trivial problem,
and needs to be addressed to have a robust compiler.

The issue is other workstreams relies on the access chain deduction to
work. So we have 2 options:
 - pause all dependent work until we have a good chain deduction.
- submit this limited fix to we can work on both this and other features
in parallel.

Choice we want to make is #2: submitting this **knowing this is not a
good** fix. It only increase the number of patterns we can work with,
thus allowing others to continue working on other parts of the backend.

This patch as-is has many limitations:
- If cannot robustly determine the depth of the structured access from a
GEP. Fixing this would require looking ahead at the full GEP chain.
- It cannot always figure out the correct access indices, especially
with dynamic indices. This will require frontend collaboration.

Because we know this is a temporary hack, this patch only impacts the
logical SPIR-V target. Physical SPIR-V, which can rely on pointer cast
remains on the old method.

Related to llvm#145002
KseniyaTikhomirova pushed a commit that referenced this pull request Aug 12, 2025
…lvm#152156)

With this new A320 in-order core, we follow adding the
FeatureUseFixedOverScalableIfEqualCost feature to A510 and A520
(llvm#132246), which reaps the same code generation benefits of preferring
fixed over scalable when the cost is equal.

So when we have:
```
void foo(float* a, float* b, float* dst, unsigned n) {
    for (unsigned i = 0; i < n; ++i)
        dst[i] = a[i] + b[i];
}
```

When compiling without the feature enabled, we get:
```
...
    ld1b    { z0.b }, p0/z, [x0, x10]
    ld1b    { z2.b }, p0/z, [x1, x10]
    add     x12, x0, x10
    ldr     z1, [x12, #1, mul vl]
    add     x12, x1, x10
    ldr     z3, [x12, #1, mul vl]
    fadd    z0.s, z2.s, z0.s
    add     x12, x2, x10
    fadd    z1.s, z3.s, z1.s
    dech    x11
    st1b    { z0.b }, p0, [x2, x10]
    incb    x10, all, mul #2
    str     z1, [x12, #1, mul vl]
...
```

When compiling with, we get:
```
...
  	ldp	    q0, q1, [x12, #-16]
	ldp	    q2, q3, [x11, #-16]
	subs	x13, x13, llvm#8
	fadd	v0.4s, v2.4s, v0.4s
	fadd	v1.4s, v3.4s, v1.4s
	add	    x11, x11, llvm#32
	add	    x12, x12, llvm#32
	stp	    q0, q1, [x10, #-16]
	add	    x10, x10, llvm#32

...
```
KseniyaTikhomirova pushed a commit that referenced this pull request Sep 29, 2025
Need this as `mlir/dialects/transform/smt.py` imports it:

```py
from .._transform_smt_extension_ops_gen import *
from .._transform_smt_extension_ops_gen import _Dialect
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.